Safely Interruptible Agents
نویسندگان
چکیده
Reinforcement learning agents interacting with a complex environment like the real world are unlikely to behave optimally all the time. If such an agent is operating in real-time under human supervision, now and then it may be necessary for a human operator to press the big red button to prevent the agent from continuing a harmful sequence of actions—harmful either for the agent or for the environment—and lead the agent into a safer situation. However, if the learning agent expects to receive rewards from this sequence, it may learn in the long run to avoid such interruptions, for example by disabling the red button— which is an undesirable outcome. This paper explores a way to make sure a learning agent will not learn to prevent (or seek!) being interrupted by the environment or a human operator. We provide a formal definition of safe interruptibility and exploit the off-policy learning property to prove that either some agents are already safely interruptible, like Q-learning, or can easily be made so, like Sarsa. We show that even ideal, uncomputable reinforcement learning agents for (deterministic) general computable environments can be made safely interruptible.
منابع مشابه
Interruptible Critical Sections
We present a new approach to synchronization on uniprocessors with special applicability to embedded and real-time systems. Existing methods for synchronization in real-time systems are pessimistic, and use blocking to enforce concurrency control. While protocols to bound the blocking of high priority tasks exist, high priority tasks can still be blocked by low priority tasks. In addition, thes...
متن کاملSafely and Efficiently Updating References During On-line Reorganization
With today’s demands for continuous availability of mission-critical databases, on-line reorganization is a necessity. In this paper we present a new on-Iine reorganization algorithm which defers secondary index updates and piggybacks them with user transactions. In addition to the significant reduction of the total I/O cost, the algorithm also assures that almost all the database is available ...
متن کاملOn Interruptible Pure Exploration in Multi-Armed Bandits
Interruptible pure exploration in multi-armed bandits (MABs) is a key component of Monte-Carlo tree search algorithms for sequential decision problems. We introduce Discriminative Bucketing (DB), a novel family of strategies for pure exploration in MABs, which allows for adapting recent advances in non-interruptible strategies to the interruptible setting, while guaranteeing exponential-rate pe...
متن کاملInterruptible Electricity Contracts from an Electricity Retailer's Point of View: Valuation and Optimal Interruption
We consider interruptible electricity contracts issued by an electricity retailer that allow for interruptions to electric service in exchange for either an overall reduction in the price of electricity delivered or for financial compensation at the time of interruption. We provide an equilibrium model to determine electricity prices based on stochastic models of supply and demand. In the conte...
متن کاملTransmission congestion management in bilateral markets: An interruptible load auction solution
This paper demonstrates that appropriate invocation of interruptible loads by the independent system operator (ISO) can aid in relieving transmission congestion in power systems. An auction model is proposed, for an ISO operating in a bilateral contract dominated market, for real-time selection of interruptible load offers while satisfying the congestion management objective. The proposed conge...
متن کامل